This tutorial is for Matlab users, and aims to parallelize you
painlessly. The Beta
Grid Cluster has 21 machines, with 1-2 processors, and 1-4GB RAM
each (41 total Intel 3.06GHz Xeon CPUs). They run SuSE Linux.
Matlab 7.0.1 (R14) is installed. Kevin Liang is the adminstrator.
Important NB: All processing is managed with the Sun
N1 Grid Engine (SGE) software. You submit jobs to this software,
which then delegates nodes in the cluster to process them. You do
not execute code on the individual nodes (hosts) yourself (in
fairness to other users).
Before you use the cluster:
- Get yourself access. Talk to your supervisor, to one of
the Beta people, to Kevin Liang. This was handled for me (Kevin LB authorized me,
Dave Brent added me to
the betaguests netgroup).
- Get network storage. Especially if your code is
data-intensive, make sure you have access to a big network scratch
disk, which is globally accessible within CS. In my case, I use
/cs/SCRATCH.
Using the
cluster:
- Start on a Linux command-line. The SGE (cluster
management) software is presently only compiled for Linux, so you
can only submit jobs from a Linux machine (read: not from Windows,
nor from the department SunOS servers). Personally, I SSH to one
of the cluster hosts (list,
ex. aluminum.icics.ubc.ca) and do all my
job submission from there. Hereon I assume you are on a Linux
command-line, logged into your CS account.
- Setup. Assuming you're using TCSH (otherwise, see Preparation),
punch in:
source
/cs/beta/lib/pkg/sge/beta_grid/common/settings.csh which
sets some environment variables. Now, type qconf -ss, which should echo the cluster
host-list if you have been granted grid access, and are properly
set up.
- Submit a job. Jobs are just shell scripts (C shell-csh,
etc) that effect your processing task. Here is a sample. Your
script should change the directory to wherever your code lives,
invoke Matlab, and exit. To submit the job, just invoke: qsub job.sh. It will assign your job a number,
delegate it to a cluster node, and then return. Workflow:
- Write a shell script that cd's to
your working directory, and invokes Matlab in
non-interactive mode (see remarks below).
- qsub myscript.sh
- Monitor your job. Shortly after your job is submitted,
two files are created in your home directory. Assuming your script
filename is batch.sh, and the job number is 12345, then your job's
standard output will be redirected to ~/batch.sh.o12345, while its
standard error is written to ~/batch.sh.e12345. You can use the
SGE program qstat -u user_name to view
the status of your job -- "r" means its running. You can also
delete a single job, viz. qdel
12345, or all your jobs qdel -u
user_name.
Misc. remarks:
- csh != sh. C Shell syntax is substantially different than
Bourne Shell syntax. (Aug-18-2005: Apparently you can use
any scripting language installed on the cluster.)
- Matlab needs to be run in non-interactive mode, and the splash
screen + desktop (GUI) should also be disabled. This is
accomplished by: matlab -r "MATLAB_CODE;"
-nosplash -nodesktop -nojvm. Obviously, your code shouldn't
try pumping out figures for visible display in this mode.
MATLAB_CODE could be run program;, for
example, where program.m is in the
working directory you specified or the Matlab path.
- With qsub you can append arguments
after the script filename -- they will be passed to your script.
For example, qsub myscript.sh 32 1964
(see my sample, where
I accept 2 arguments).
- To see what's happening, use the utility tail on the standard input/error files to
print their last few lines (assuming your script/Matlab code does
any output).
- If you use Windows like me, you'll have to re-Mex all your
external c/Fortran libraries.
- You can specify where the standard input/output files are
written (it defaults to your home directory), but I haven't played
with this feature yet.
Happy clustering. Return to my
main page.
|